-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL][Core] SampleExec Operator Native Support #5856
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
…gayangya/sampleExec
Run Gluten Clickhouse CI |
@rui-mo / @zhouyuan / @zhli1142015 help review |
gluten-core/src/main/scala/org/apache/gluten/execution/SampleExecTransformer.scala
Show resolved
Hide resolved
Will Sampling with replacement (withReplacement is set to true) fallback? how complex if we add support to Velox? how useful it is? |
sample with relpacement allow pick up same row multiple times, that means it actually not simple filter but re-generating the data set, that is not worked by simple convert to a filter rand expression. I don't yet figur out a good way to supprot that part unless to make a new about how useful |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@zhli1142015 / @zhouyuan help merge |
What changes were proposed in this pull request?
The PR introduce the
SampleExec
native support (SampleExecTransformer
) for velox backend, basically it transfer theSampleExec
spark physical operator to veloxProjectFilter
operator withrand(seed) < fraction
expression.The feature is under feature flag
spark.gluten.sql.columnarSampleEnabled
with default is false now.SampleExec support two sample way
Right now, we only support for samling without replacement.
(Please fill in changes proposed in this fix)
(Fixes: #5315)
How was this patch tested?
locally test with turning on feature flag
A case use
df.sample(0.5, 10)
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)